Firefly Monte Carlo: Exact MCMC with Subsets of Data
نویسندگان
چکیده
Markov chain Monte Carlo (MCMC) is a popular and successful general-purpose tool for Bayesian inference. However, MCMC cannot be practically applied to large data sets because of the prohibitive cost of evaluating every likelihood term at every iteration. Here we present Firefly Monte Carlo (FlyMC) an auxiliary variable MCMC algorithm that only queries the likelihoods of a potentially small subset of the data at each iteration yet simulates from the exact posterior distribution, in contrast to recent proposals that are approximate even in the asymptotic limit. FlyMC is compatible with a wide variety of modern MCMC algorithms, and only requires a lower bound on the per-datum likelihood factors. In experiments, we find that FlyMC generates samples from the posterior more than an order of magnitude faster than regular MCMC, opening up MCMC methods to larger datasets than were previously considered feasible.
منابع مشابه
Asymptotically Exact, Embarrassingly Parallel MCMC
Communication costs, resulting from synchronization requirements during learning, can greatly slow down many parallel machine learning algorithms. In this paper, we present a parallel Markov chain Monte Carlo (MCMC) algorithm in which subsets of data are processed independently, with very little communication. First, we arbitrarily partition data onto multiple machines. Then, on each machine, a...
متن کاملInference for Lévy Driven Stochastic Volatility Models Via Adaptive Sequential Monte Carlo
In the following paper we investigate simulation methodology for Bayesian inference in Lévy driven SV models. Typically, Bayesian inference from such statistical models is performed using Markov chain Monte Carlo (MCMC) methods. However, it is well-known that fitting SV models using MCMC is not always straight-forward. One method that can improve over MCMC is SMC samplers ([14]), but in that ap...
متن کاملUsing Markov chain Monte Carlo for multipoint linkage analysis: Improved estimates of lod scores
The calculation of exact likelihoods from pedigree data is limited to datasets containing either a small number of meioses, or a small number of linked genetic loci. In particular, the computation of likelihoods from data collected at multiple loci on large, extended pedigrees is infeasable. We perform multipoint linkage analysis on such datasets by estimating ratios of these otherwise intracta...
متن کاملParallel MCMC with generalized elliptical slice sampling
Probabilistic models are conceptually powerful tools for finding structure in data, but their practical effectiveness is often limited by our ability to perform inference in them. Exact inference is frequently intractable, so approximate inference is often performed using Markov chain Monte Carlo (MCMC). To achieve the best possible results from MCMC, we want to efficiently simulate many steps ...
متن کاملGeneralizing Elliptical Slice Sampling for Parallel MCMC
Probabilistic models are conceptually powerful tools for finding structure in data, but their practical effectiveness is often limited by our ability to perform inference in them. Exact inference is frequently intractable, so approximate inference is often performed using Markov chain Monte Carlo (MCMC). To achieve the best possible results from MCMC, we want to efficiently simulate many steps ...
متن کامل